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(S) Multimode and multiple character string run length encoding method and apparatus. 



(57) Improvements are made to standard run 
length encoding compression techniques to 
permit frequently occurring repeated bytes to 
be dynamically redefined or reset to a default 
value such as a blank character, repeated multi- 
ple byte units or strings to be more efficiently 
coded and run length encoded enhancements 
allow compression of data where characters are 
represented by multiple bytes. The Sequence 
Control Byte (SCB) is modified to communicate 
indications to a receiver that the compression 
mode of 1 to N bytes per character is being 
changed and to indicate what the change is or 
that a common master repeat character fre- 
quently encountered in data is being redefined 
to be another character or that characters are 
going to be encoded in multiple bytes. The SCB 
format which is well known in the prior art is 
modified to include specific bit patterns or 
codes in the first two bits of the SCB byte to 
indicate setting of the bytes per character en- 
coding mode to a different value, resetting the 
encoding mode to a default value or redefining 
a commonly repeated character or defining a 
character to be multiple bytes or a string of 
characters which may be multiple byte charac- 
ters. The other six bits of the SCB are assigned 
code values unused in the prior art to indicate 
the number of times that a defined character is 
to be repeated, whether a master character that 
has been defined is to be repeated or whether a 
character string is to be repeated. Two fields of 
data are thus formatted in the SCB with new 
values to indicate t a receiv r th s new 
crit ria. 
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This invention relates to digital data communica- 
tion techniques and systems in general and more spe- 
cifically to data compression methods and apparatus 
for sending data over a communication system and in- 
terpreting it at a receiver, particularly data compres- 
sion of the Run Length Encoded type. 

Run Length Encoding is an old and well known 
compression technique historically employed in many 
different systems. It relies on the notion that, partic- 
ularly in digital data, sequences of similar characters 
often occur in unbroken strings. Compression is ac- 
complished in the Run Length Encoding method by 
sending a control character indicating the identity of 
the repeated character and the number of times it is 
to be repeated. US Patent 4626829 and the plethora 
of prior art cited as references therein may be refer- 
red to for understanding the general Run Length En- 
coding schemes known to exist. 

Run Length Encoding compression algorithms in 
general use detect the repeated character bytes in an 
input string of digital data and replace them with two 
bytes of information, the first byte indicating the num- 
ber of repeats of the repeated character and the sec- 
ond byte representing the character or byte which is 
to be repeated. The first byte or control byte is com- 
monly called the "Sequence (or String) Control Byte" 
(SCB) and consists of a code field and a count field. 

A blank character generally encoded as X'40' is a 
common repeated byte. Repeats of a blank character 
are often encoded with a special code in the SCB so 
that only the SCB itself is required. This further im- 
proves the compression ratio. The foregoing algo- 
rithm and variations of it is implemented in IBM VTAM 
products and in other subsystems provided not only 
by IBM but by other companies. 

Difficulties with the known prior art are that the 
frequently occurring repeat byte is difficult to redefine 
dynamically or to reset to a default value, repeated 
multiple byte units are difficult to encode efficiently 
and Run Length Encoding is presently not used to 
compress data where individual characters are repre- 
sented by multiple bytes, a situation commonly en- 
countered with compression of video data or Japa- 
nese Kanji text 

Languages such as the Japanese Kanji require 
more than a single eight-bit byte to represent each 
character. Currently existing Run Length Encoded 
methods do not facilitate compression of Kanji text ef- 
fectively because they encompass only the compres- 
sion of single byte characters. Graphic character sets, 
such as those used in video input/output subsystems 
for personal computers and the like usually require 
more than one byte to represent each screen charac- 
ter because the character that appears on the screen 
is represented to a display adapter driver as a two- by- 
te pair to represent the ASCII character and a one-by- 
t display attribute code which shows color intensity, 
high-lighting, etc. 



For the foregoing types of data the Run Length 
Encoding methods employed fall short since they are 
unable to encode repeated characters where the 
characters themselves consist of multiple bytes. 

5 A coding mechanism in which frequently occur- 

ring repeated characters or repeated bytes in the 
case of a one byte per character mode of operation 
exists in the prior art as the socalled "master 11 char- 
acter and is indicated by the SCB alone utilizing a 

10 specific bit pattern within the SCB. However, this 
does not permit redefinition of the default or common- 
ly known repeat character. 

Finally, repeated multiple character strings or 
units often occur in digital data but the prior art Run 

15 Length Encoding mechanisms do not permit com- 
pression of a multicharacter unit or string where the 
characters are not the same character repeated. 

In light of the foregoing known difficulties with ex- 
isting prior art Run Length Encoding methods and ap- 

20 paratus, it is an object of the present invention to pro- 
vide an improved RLE compression method and ap- 
paratus which permits changing the mode f interpre- 
tation from 1 to N bytes per character or switching to 
a default mode of just one byte per character. 

25 It is a fur t her object of t h is invention to provide an 

improved Run Length Encoding method and appara- 
tus which permit redefining the frequently occurring 
repeat character at will. 

Another object of the invention is to provide an 

30 improved Run Length Encoding method and appara- 
tus in which multiple different character strings may 
be encoded as a single unit and repeated in a highly 
efficient compression code. 

The invention is embodied by assigning new SCB 

35 codes from among those currently unused or r - 
served and by modifying the meanings of those al- 
ready assigned. The all zero value of the eight bit SCB 
which has been reserved in the past is defined to 
mean "reset mode to one byte per character" and to 

40 indicate that the master character to be repeated for 
a commonly occurring character is reset to a blank 
(X'40') character. The SCB code 01000000 is rede- 
fined to be interpreted as a command to set the mode 
of operation from 1 to N bytes per character and to re- 

45 set the master character to an arbitrarily defined char- 
acter. The length (in bytes) of the new master char- 
acter follows the SCB, and the master character itself 
follows the length byte. The SCB of 01aaabbb f where 
a and b are arbitrary digital values, is redefined to 

so mean repeating a number of characters defined by 
aaa a number of times defined by bbb, where aaa and 
bbb are digital values from 2 to 7. 

The SCB code 10mmmmmm is redefined to 
mean the encoding of a repeat of the master charae- 

55 ter from 1 to 63 times as represented by the digital val- 
ue of mmmmmm. Finally, the SCB code Upppppp .? 
modified to mean repeating of the character foil owing 
the SCB for a count equal to the value of pppppp from 

2 
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2 to 63. 

The foregoing and other objects of the invention 
not specifically enumerated are met in a preferred 
embodiment of the invention furth r described and il- 
lustrated with respect to the drawings in which: 5 

Figure 1 illustrates the typical Run Length Encod- 
ing SCB assignments in the prior art and generally il- 
lustrates the process of Run Length Encoding. 

Figure 2 illustrates the preferred embodiment of 
the invention in which new SCB values and their 10 
meanings are defined for use in the general compres- 
sion process. 

Figure 3 illustrates the preferred form of the hard- 
ware embodiment of the implementation of the proc- 
ess in computer code calls and register contents. 15 

Figure 4 illustrates computer functional code as- 
signments for executing the compression or decom- 
pression process according to the preferred embodi- 
ment. 

Figure 5 illustrates an overall schematic of a data 20 
communication system which may include Run 
Length Encoding compression either in the host CPU 
or in a modem as is well understood. 

Figure 1 illustrates the well known Run Length 
Encoding (RLE) technique as employed in the prior 25 
art In the top of Figure 1 , a schematic representation 
of source data in its raw or uncompressed form con- 
sisting of multiple 8-bit bytes as is conventional is rep- 
resented. It is illustrated that the multiple bytes, which 
are examined by a computer process of comparing 30 
sequential bytes one against another to detect strings 
of repeated like bytes, may be encoded in the RLE 
compressed format as shown by the SCB code and 
count fields representing one sequence control byte 
of the Run Length Encoded compression technique 35 
known in the art. 

In the known art, the SCB codes which comprise 
the first two bits of the SCB are assigned specific 
meanings or functions and the count field which rep- 
resents the other six bits in the SCB are also defined 40 
meaning or significance as shown in the explanation 
portion of Figure 1. The output of the compression 
process is also illustrated at the bottom of Figure 1 
where coded compressed data is assembled by the 
transmitter either in a CPU or in a modem as shown 45 
in Figure 5 to incorporate a destination header or ad- 
dress followed by the SCB data, which may be one or 
two bytes and the compressed data itself which fol- 
lows. 

At a receiver such as shown in Figure 5, Run 50 
Length Encoded data is interpreted by examining the 
SCB and decoding its significance, SCB's having val- 
ues all 0, 01 and any arbitrary count, 10 and all zero 
count or 11 and count of 0 or 1 are reserved and not 
used in the prior art SCB value 00 followed by any ar- 55 
bitrary count is used in the prior art to indicate that 
characters are not repeated for a count equal to some 
number from 1 to 63 bytes as represented by nnnnnn, 



i.e. it is an indication that data from 1 to 63 bytes is 
not compressed and are not repeated characters. The 
code 10 mmmmmm is interpreted to mean a repeating 
of blank characters for a count of mmmmmm up to 63 
bytes and the code of 11 pppppp is indicated or inter- 
preted to mean the repeating of the next occurring 
byte following the SCB for a count equal to pppppp 
between 2 and 63 bytes. 

Turning to Figure 2, the preferred embodiment of 
the invention contemplates reassigning reserved 
SCB values and/or redefining the meaning to encom- 
pass the capability for altering the master repeat 
character, changing the mode of interpretation from 1 
to N bytes per character and allowing for the repeat- 
ing of any arbitrarily defined character which may be 
composed of any arbitrary number of bytes for an ar- 
bitrary count or number of occurrences. The overall 
RLE encoding process and apparatus are the same 
as those indicated for Figures 1 and 5, but new func- 
tions and significance are attributed to the SCB val- 
ues as will be described. 

As referred to earlier, some languages and data 
communication environments require sending and r - 
ceiving of characters represented by multiple bytes. 
Multiple byte characters are commonly encountered 
in Japanese Kanji language or in video input/output 
data with attribute bytes following the character rep- 
resentation: Currently, any number from 1 to 4 bytes 
may be required to. represent a character in a given 
field of application or language, and while 4 is the cur- 
rently known maximum, the general method of this in- 
vention is applicable to any arbitrary value n so that 
characters may be represented by any value from 1 
to n bytes. The SCB value 01 000000 is utilized in th 
present invention to set the mode of operation from 1 
to n bytes per character and to indicate the setting f 
a new definition of the master control or commonly re- 
peated character to a new definition. The prior art re- 
served SCB code of all zeros is redefined to mean re- 
set mode of operation to 1 byte per character and a 
reset of the master character to the commonly occur- 
ring blank character 

In other words, code 00 with a count of 0 in the 
SCB first and second code fields indicates a reset op- 
eration in which the mode of operation (or mode of 
character encoding) is reset to the default value of 1 
byte per character and the master character which is 
to be repeated is reset to its default (hex) value of 
X'40* or blank. This particular SCB encoding may be 
used to make the implementation upwardly compat- 
ible with existing implementations in the industry 
where the mode of character encoding is always 1 
byte per character and the master character or repeat 
character is always X'40' or blank and wherein a set- 
ting of the master character to some arbitrarily differ- 
ent character is not supported. 

Code 00 in the first field of the SCB with a non- 
zero count from 1 to 63 specifies a sequence of non- 
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replicated characters, i.e. bytes, in the case of a 1 
byte per character mode of operation which follow the 
SCB. 

A code of 01 in the first field of the SCB followed 
by a count of 0 in the second field indicates the con- 
tents of the byte following the SCB will be used to de- 
fine the mode of character encoding operation from 
1 to n depending upon the content of the byte follow- 
ing the SCB. It also is used to indicate to the receiver 
that the character next following the mode defining 
byte will be interpreted to be the new master character 
or repeat character. 

Code 01 in the first field of the SCB followed by 
a count of aaabbb (in binary) indicates that the string 
of information following the SCB consists of a number 
of characters represented by the binary value of aaa 
which unit or string is to be repeated bbb times repre- 
sented in binary by the bbb portion of the count field 
of the SCB. Other combinations of the count field for 
this SCB having a 01 first field are presently re- 
served. An alternative embodiment may use the byte 
following the SCB to contain the string repeat count 
from 2 to 255, in which case the string could be rep- 
resented by the second field of the SCB to be any- 
thing from 2 to 63 characters in length. 

The SCB code 10 in the first field with a count 
field (the "second field") of 0 is reserved as in the pri- 
or art The first field coded 10 with a non-zero count 
field representing a count (in binary) from 1 to 63 spe- 
cif ies a sequence of master control characters to be 
repeated for the count equal to what is shown in the 
count field. 

A code of 11 in the first field with a count of 0 or 
1 is currently reserved. A code of 11 with a non-zero 
count from 2 to 63 is interpreted to specify a sequence 
of replicated characters where the byte following the 
SCB is the character which is to be repeated for the 
number of times indicated by the count field in the 
SCB. 

It is customary to implement RLE encoding either 
in a host computer of a PC or mainframe type or in mo- 
dems for compressing digital data for transmission 
over a communication line as is shown schematically 
in Figure 5. Assuming that the compression occurs in 
the processor incorporated in most modern modems 
or in the host processor, compression routines are 
available in application software and may utilize the 
following internal formats and structures as shown in 
Figures 3 and 4. 

In Figure 3, the compression operation call may 
be encoded as an operation code in bits 0-16 in a ma- 
chine instruction with the general registers containing 
the operand addresses specified in bits 24-27 and 28- 
31 for general registers R1 and R2 which will contain 
the operand addresses of the first and second oper- 
ands, the significance of which will be discussed later 
A General Register GR0 is also indicated in Figure 3 
as containing s veral fields including C which is a 



continuation status bit, M which is a mode indicating 
field of up to 2 bits and a function code field of up to 
8 bits. A General Register GR1 contains the 32-bit 
representation of the master character and the Gen- 
5 eral Registers R1 andR1+1, R2and R2+2 contain, re- 
spectively, the first operand address and its length 
and the second operand address and its length as 
shown. 

Figure 4 illustrates the function code assign- 
to ments that may be utilized in the processor where the 
functional code is represented in hexadecimal form 
and the designated process is as shown in the table 
of assignments. The compression or expansion oper- 
ations are the only ones assigned in the table of as- 
15 signments in Figure 4, with all other function codes 
being reserved. 

In operation, either a portion or all of the second 
operand is fetched and processed and the result 
placed in the first operand location. Whether a part 
20 or all of the second operand is processed is indicated 
by the condition code C. R1 and R2 registers are as- 
sumed to designate non-zero even numbered regis- 
ters and bits 16-23 of the instruction compression call 
are ignored in this example. For uniformity, unused 
25 bits in all registers utilized in the instruction should b 
all zero. The locations in memory of the leftmost byt 
of the first operand and of the second operand are 
specified by the contents of the General Registers R1 
and R2, respectively. The contents of R1 +1 and R2+2 
30 contain the 32-bit signed number integers specifying 
the number of bytes in the respective operands. Th 
handling of addresses in the General Registers R1 
and R2 is dependent upon the addressing mode em- 
ployed within the processor. Bit 0 (C) of General Reg- 
35 ister 0 is the continuation bit A zero in this field indi- 
cates that the compression operation must be started 
from the beginning while a 1 indicates that the op r- 
ation of compression is a continuation of a previously 
started compression operation so that the creation of 
40 a new SCB at the output to designate the mode or the 
master character is not required when continuation is 
shown. 

Bits 1 and 2 of General Register 0 are the M field 
and specify the mode of operation. Since only two 

45 bits are employed in this embodiment only four 
modes may be indicated. The modes of compression 
indicate that 1 to 4 bytes per character, dependent 
upon the language or type of characters being corn- 
pressed, are indicated by this value. A mode of M=0 

so indicates 1 byte per character and M=3 would indicate 
4 bytes per character mode. The M field may be ex- 
panded as desired to indicate any arbitrary number n 
of bytes per character. The data to be compressed in 
the raw data stream will dictate the mode of opera- 

55 tion. 

Bits 8-15 of the General Register 0 are the func- 
tion code and specify the operation. The method sup- 
ported in this embodiment of Run Length Encoding 
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compression or decompression are the only functions 
assigned. 

Access to the second operand, which is the 
source of data, is performed using the commonly em- 
ployed program status word key which fetches the 5 
contents of the specified location and invokes the 
processing according to the specified method or al- 
gorithm within the processor. The results are stored 
in the first operand or destination performed using 
the access key K represented by bits 16-19 of Gen- 10 
eral Register 0. 

The contents of General Register 1 GR1 specify 
the Master Character MC to be used during a time 
period of data compression. Depending upon the 
mode of operation specified by the mode specifying 1 5 
field M, MC will be defined as being 1 to 4 bytes. If 
mode is 0, i.e. 1 byte per character, then bits 24-31 of 
General Register 1 will specify the Master Control 
Character MC. When M=3, i.e. 4 bytes per character, 
then bits 0-31 of General Register 1 will specify the 20 
Master Character MC. A particular Master Character 
is specified at the initiation of the compression oper- 
ation and may be any arbitrary character known to be 
frequently encountered in the specific type of data be- 
ing compressed. 25 

When operation of the compression function is 
complete or when it is interrupted, the length fields in 
registers R1+1 and R2+2 are decremented by the 
number of bytes that have been fully processed in the 
two operands handled by the instructions. The ad- 30 
dresses in R1 and R2 are incremented by the same 
amount. The field C in General Register 1 is set to a 
1 to indicate that an interruption or completion of op- 
eration has occurred or not so that, when resumed, 
the hardware may take appropriate action. C is set to 35 
0 when the operation ends because the source input 
has been completely processed. Condition code C=1 
is set when the operation ends because the end of a 
string or compression run has been reached but more 
data remains to be compressed. If the end of the 40 
source input raw data is reached simultaneously, then 
condition code 0 would be set Condition code 3 may 
be set when unusual conditions preclude normal com- 
pletion of the compression operation, i.e. a hardware 
or software failure. 45 

When the length of a given operand is 0, no ac- 
cess exceptions are recognized and no compression 
or movement of data will take place, the condition 
code C being set to 0 or 1 as appropriate. 

For compression operations, the RLE method so 
places special characters called SCB's as noted 
above into the first operand or output as the result of 
its operation and designates sequences of bytes in 
the second operand which is the input or raw data. 
For expansion using the RLE process, the algorithm 55 
will interpret the compressed data in the input or sec- 
ond operand field consisting of SCBs and characters 
following the SCBs and will regenerate the original 



data as output into the first operand. 

As noted earlier, the single byte SCB is broken 
into two fields or parts with bits 0-1 being the code 
field and bits 2-7 being a count field. The significance 
of the codes and count f ields and their method of in- 
terpretation has already been described. Given the 
general understanding of RLE compression and de- 
compression and the assignments of the code and 
count fields and their meaning as shown with the dis- 
cussion relative to Figure 2, it will be apparent to those 
of skill in the art that the invention may be easily em- 
ployed in existing RLE compression transmission or 
reception systems and that numerous departures, ex- 
tension or modification by way of accounting for dif- 
ferent types of character encoding modes of opera- 
tion may be easily facilitated utilizing the general tech- 
nique as described. 



Claims 

1 . A method of Run Length Encoding digital data for 
compressed transmission to a receiver for de- 
compression comprising steps at a transmitter of: 
generating a first sequence control byte (SCB) 
whenever a change in character byte encoding 
from 1 to n bytes per character is desired, said 
SCB having at least a first and a second control 
data field encoded so that said first field has th 
digital value 01 and said second field has the dig- 
ital value 000000, said values indicating to a re- 
ceiver that a change in character byte encoding 
mode is to be carried out on characters received 
following said SCB; and 

encoding a second control byte to follow said 
SCB, said second control byte indicating to said 
receiver the number of bytes per character to be 
used in decompressing received data at said re- 
ceiver; and 

encoding a third control field to follow said sec- 
ond control byte, said third control field indicating 
to said receiver the identity of the master repeat 
character. 

2. A method as claimed in claim 1, wherein: 

said generating step of said first SCB is per- 
formed with its said second control data field en- 
coded as a digital value aaabbb, indicating to said 
receiver that said compressed digital data is to be 
interpreted as repeated multicharacter strings 
where aaa is a binary number in the range of 2- 
7 and indicates the number of characters in a giv- 
en string and bbb is a binary number in the range 
of 2-7 and indicates the number of times that the 
identified string is to be repeated. 

3. A method as claimed in claim 1, wherein: 

said generating step of said first SCB is per- 
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formed with its said first control data field encod- 
ed as 00, indicating, in combination with said sec- 
ond control data field, that the mode of character 
byte encoding is to be reset to one byte per char- 
acter and that the master character default value 5 
of blank is to be assumed. 

A method as claimed in any one of claims 1-3, 
wherein: 

said generating step of said first SC6 is per- 10 
formed with its said first control data field encod- 
ed as the digital value 10 and with said second 
control data field encoded as a digital value 
mmmrnmm being a binary value from 1 to 63 in- 
dicating the currently existing master character is 15 
to be repeated for a count equal to the value of 
mmmrnmm in the decompressed form of the 
data. 

A method as claimed in any one of claims 1-3, 20 
wherein: 

said generating step of said first SCB is per- 
formed with its said first control data field encod- 
ed as a digital value 11 and said second field is 
encoded as a digital value pppppp being a binary 25 
value in the range of 2 to 63 indicating the char- 
acter that next follows SCB is to be repeated in 
the decompressed data for a count equal to 
pppppp times. 

30 

Apparatus for Run Length Encoding digital data 
for compressed transmission to a receiver for de- 
compression comprising: 

means for generating a first sequence control 
byte (SCB) whenever a change in character byte 35 
encoding from 1 to n bytes per character is de- 
sired, said SCB having at least a first and a sec- 
ond control data field encoded so that said first 
field has the digital value 01 and said second field 
has the digital value 000000, said values indicat- 40 
ing to a receiver that a change in character byte 
encoding mode is to be carried out on characters 
received following said SCB; and 
means for encoding a second control byte to fol- 
low said SCB, said second control byte indicating 45 
to said receiver the number of bytes per character 
to be used in decompressing received data at 
said receiver; and 

means for encoding a third control field to follow 
said second control byte, said third control field 50 
indicating to said receiver the identity of the mas- 
ter repeat character. 

Apparatus as claimed in claim 6, wherein: 
said means for generating said first SCB encodes 55 
said second control data field as a digital value 
aaabbb, indicating to said receiver that said com- 
pressed digital data is to be interpreted as repeat- 



ed multicharacter strings where aaa is a binary 
number in the range of 2-7 and indicates the 
number of characters in a given string and bbb is 
a binary number in the range of 2-7 and indicates 
the number of times that the identified string is to 
be repeated. 

8. Apparatus as claimed in claim 6, wherein: 

said means for generating said first SCB encodes 
said first control data field as 00, indicating, in 
combination with said second control data field, 
that the mode of character byte encoding is to be 
reset to one byte per character and that the mas- 
ter character default value of blank is to be as- 
sumed. 

9. Apparatus as claimed in any one of claims 6-8, 
wherein: 

said means for generating said first SCB encodes 
said first control data field as the digital value 10 
and said second control data field as a digital val- 
ue mmmrnmm being a binary value from 1 to 63 
indicating the currently existing master charact r 
is to be repeated for a count equal to the value of 
mmmrnmm in the decompressed form of the 
data. 

10. Apparatus as claimed in any one of claims 6-8, 
wherein: 

said means for generating said first SCB encodes 
said first control data field as a digital value 11 
and said second field as a digital value pppppp 
being a binary value in the range of 2 to 63 indi- 
cating the character that nextfollows SCB is to be 
repeated in the decompressed data for a count 
equal to pppppp times. 
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